FBB: a fast Bayesian-bound tool to calibrate RNA-seq aligners
نویسندگان
چکیده
MOTIVATION Despite RNA-seq reads provide quality scores that represent the probability of calling a correct base, these values are not probabilistically integrated in most alignment algorithms. Based on the quality scores of the reads, we propose to calculate a lower bound of the probability of alignment of any fast alignment algorithm that generates SAM files. This bound is called Fast Bayesian Bound (FBB) and serves as a canonical reference to compare alignment results across different algorithms. This Bayesian Bound intends to provide additional support to the current state-of-the-art aligners, not to replace them. RESULTS We propose a feasible Bayesian bound that uses quality scores of the reads to align them to a genome of reference. Two theorems are provided to efficiently calculate the Bayesian bound that under some conditions becomes the equality. The algorithm reads the SAM files generated by the alignment algorithms using multiple command option values. The program options are mapped into the FBB reference values, and all the aligners can be compared respect to the same accuracy values provided by the FBB. Stranded paired read RNA-seq data was used for evaluation purposes. The errors of the alignments can be calculated based on the information contained in the distance between the pairs given by Theorem 2, and the alignments to the incorrect strand. Most of the algorithms (Bowtie, Bowtie 2, SHRiMP2, Soap 2, Novoalign) provide similar results with subtle variations. AVAILABILITY AND IMPLEMENTATION Current version of the FBB software is provided at https://bitbucket.org/irenerodriguez/fbb CONTACT: [email protected] information: Supplementary data are available at Bioinformatics online.
منابع مشابه
DART: a fast and accurate RNA-seq mapper with a partitioning strategy
Motivation In recent years, the massively-parallel cDNA sequencing (RNA-Seq) technologies have become a powerful tool to provide high resolution measurement of expression and high sensitivity in detecting low abundance transcripts. However, RNA-seq data requires a huge amount of computational efforts. The very fundamental and critical step is to align each sequence fragment against the referenc...
متن کاملRNA-Seq Bayesian Network Exploration of Immune System in Bovine
Background: The stress is one of main factors effects on production system. Several factors (both genetic and environmental elements) regulate immune response to stress. Objectives: In order to determine the major immune system regulatory genes underlying stress responses, a learning Bayesian network approach for those regulatory genes was applied to RNA-...
متن کاملCADBURE: A generic tool to evaluate the performance of spliced aligners on RNA-Seq data
The fundamental task in RNA-Seq-based transcriptome analysis is alignment of millions of short reads to the reference genome or transcriptome. Choosing the right tool for the dataset in hand from many existent RNA-Seq alignment packages remains a critical challenge for downstream analysis. To facilitate this choice, we designed a novel tool for comparing alignment results of user data based on ...
متن کاملEvaluation of Alignment Algorithms for Discovery and Identification of Pathogens Using RNA-Seq
Next-generation sequencing technologies provide an unparallelled opportunity for the characterization and discovery of known and novel viruses. Because viruses are known to have the highest mutation rates when compared to eukaryotic and bacterial organisms, we assess the extent to which eleven well-known alignment algorithms (BLAST, BLAT, BWA, BWA-SW, BWA-MEM, BFAST, Bowtie2, Novoalign, GSNAP, ...
متن کاملA Comprehensive Evaluation of Alignment Algorithms in the Context of RNA-Seq
Transcriptome sequencing (RNA-Seq) overcomes limitations of previously used RNA quantification methods and provides one experimental framework for both high-throughput characterization and quantification of transcripts at the nucleotide level. The first step and a major challenge in the analysis of such experiments is the mapping of sequencing reads to a transcriptomic origin including the iden...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Bioinformatics
دوره 33 2 شماره
صفحات -
تاریخ انتشار 2017